21 research outputs found

    Monocular depth estimation in images and sequences using occlusion cues

    Get PDF
    When humans observe a scene, they are able to perfectly distinguish the different parts composing it. Moreover, humans can easily reconstruct the spatial position of these parts and conceive a consistent structure. The mechanisms involving visual perception have been studied since the beginning of neuroscience but, still today, not all the processes composing it are known. In usual situations, humans can make use of three different methods to estimate the scene structure. The first one is the so called divergence and it makes use of both eyes. When objects lie in front of the observed at a distance up to hundred meters, subtle differences in the image formation in each eye can be used to determine depth. When objects are not in the field of view of both eyes, other mechanisms should be used. In these cases, both visual cues and prior learned information can be used to determine depth. Even if these mechanisms are less accurate than divergence, humans can almost always infer the correct depth structure when using them. As an example of visual cues, occlusion, perspective or object size provide a lot of information about the structure of the scene. A priori information depends on each observer, but it is normally used subconsciously by humans to detect commonly known regions such as the sky, the ground or different types of objects. In the last years, since technology has been able to handle the processing burden of vision systems, there has been lots of efforts devoted to design automated scene interpreting systems. In this thesis we address the problem of depth estimation using only one point of view and using only occlusion depth cues. The thesis objective is to detect occlusions present in the scene and combine them with a segmentation system so as to generate a relative depth order depth map for a scene. We explore both static and dynamic situations such as single images, frame inside sequences or full video sequences. In the case where a full image sequence is available, a system exploiting motion information to recover depth structure is also designed. Results are promising and competitive with respect to the state of the art literature, but there is still much room for improvement when compared to human depth perception performance.Quan els humans observen una escena, son capaços de distingir perfectament les parts que la composen i organitzar-les espacialment per tal de poder-se orientar. Els mecanismes que governen la percepció visual han estat estudiats des dels principis de la neurociència, però encara no es coneixen tots els processos biològic que hi prenen part. En situacions normals, els humans poden fer servir tres eines per estimar l’estructura de l’escena. La primera és l’anomenada divergència. Aprofita l’ús de dos punts de vista (els dos ulls) i és capaç¸ de determinar molt acuradament la posició dels objectes ,que a una distància de fins a cent metres, romanen enfront de l’observador. A mesura que augmenta la distància o els objectes no es troben en el camp de visió dels dos ulls, altres mecanismes s’han d’utilitzar. Tant l’experiència anterior com certs indicis visuals s’utilitzen en aquests casos i, encara que la seva precisió és menor, els humans aconsegueixen quasi bé sempre interpretar bé el seu entorn. Els indicis visuals que aporten informació de profunditat més coneguts i utilitzats són per exemple, la perspectiva, les oclusions o el tamany de certs objectes. L’experiència anterior permet resoldre situacions vistes anteriorment com ara saber quins regions corresponen al terra, al cel o a objectes. Durant els últims anys, quan la tecnologia ho ha permès, s’han intentat dissenyar sistemes que interpretessin automàticament diferents tipus d’escena. En aquesta tesi s’aborda el tema de l’estimació de la profunditat utilitzant només un punt de vista i indicis visuals d’oclusió. L’objectiu del treball es la detecció d’aquests indicis i combinar-los amb un sistema de segmentació per tal de generar automàticament els diferents plans de profunditat presents a una escena. La tesi explora tant situacions estàtiques (imatges fixes) com situacions dinàmiques, com ara trames dins de seqüències de vídeo o seqüències completes. En el cas de seqüències completes, també es proposa un sistema automàtic per reconstruir l’estructura de l’escena només amb informació de moviment. Els resultats del treball son prometedors i competitius amb la literatura del moment, però mostren encara que la visió per computador té molt marge de millora respecte la precisió dels humans

    Underwater Acoustic MIMO OFDM: An experimental analysis

    Get PDF
    Projecte fet en col.laboració amb Massachusetts Institute of TechnologyResearch e orts over the past several years have provided ample proof that orthogonal frequency division multiplexing (OFDM) represents a viable alternative to single-carrier modulation which has traditionally been used for high rate communications over underwater acoustic channels. The main attraction of OFDM lies in its simplicity of implementation via FFT modulation/demodulation, which makes it a candidate for implementation in the next generation of acoustic modems

    Monocular Depth Ordering Using Occlusion Cues

    Get PDF
    English: This project proposes a system to relate the objects in an image using occlusion cues and arrange them according to depth. The system does not rely on a priori knowledge of the scene structure and focus on detecting special points, such as T-junctions and high convexity regions, to infer the depth relationships between objects in the scene. The system makes extensive use of the Binary Partition Tree (BPT) as the segmentation tool jointly with a new approach for T-junction candidate point estimation. In a BPT approach, as a bottom-up strategy, regions are iteratively merged and grown from pixels until only one region is left. At each step, our system estimates the junction points, where three regions meet. When the BPT is constructed and the pruning is performed, this information is used for depth ordering. Since many images may not have occlusion points formed by junctions, occlusion is also detected by examining convex shapes on region boundaries. Combining T-junctions and convexity lead to a system which only relies on low level depth cues and does not involve any learning process. However, it shows a similar performance with the state of the art

    Monocular Depth Ordering Using Occlusion Cues

    No full text
    English: This project proposes a system to relate the objects in an image using occlusion cues and arrange them according to depth. The system does not rely on a priori knowledge of the scene structure and focus on detecting special points, such as T-junctions and high convexity regions, to infer the depth relationships between objects in the scene. The system makes extensive use of the Binary Partition Tree (BPT) as the segmentation tool jointly with a new approach for T-junction candidate point estimation. In a BPT approach, as a bottom-up strategy, regions are iteratively merged and grown from pixels until only one region is left. At each step, our system estimates the junction points, where three regions meet. When the BPT is constructed and the pruning is performed, this information is used for depth ordering. Since many images may not have occlusion points formed by junctions, occlusion is also detected by examining convex shapes on region boundaries. Combining T-junctions and convexity lead to a system which only relies on low level depth cues and does not involve any learning process. However, it shows a similar performance with the state of the art

    Underwater Acoustic MIMO OFDM: An experimental analysis

    Get PDF
    Projecte fet en col.laboració amb Massachusetts Institute of TechnologyResearch e orts over the past several years have provided ample proof that orthogonal frequency division multiplexing (OFDM) represents a viable alternative to single-carrier modulation which has traditionally been used for high rate communications over underwater acoustic channels. The main attraction of OFDM lies in its simplicity of implementation via FFT modulation/demodulation, which makes it a candidate for implementation in the next generation of acoustic modems

    Underwater Acoustic MIMO OFDM: An experimental analysis

    No full text
    Projecte fet en col.laboració amb Massachusetts Institute of TechnologyResearch e orts over the past several years have provided ample proof that orthogonal frequency division multiplexing (OFDM) represents a viable alternative to single-carrier modulation which has traditionally been used for high rate communications over underwater acoustic channels. The main attraction of OFDM lies in its simplicity of implementation via FFT modulation/demodulation, which makes it a candidate for implementation in the next generation of acoustic modems

    Underwater acoustic MIMO OFDM: an experimental analysis

    No full text
    Performance of multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) is analyzed on an experimental shallow water acoustic channel. Different modulation levels, numbers of subcarriers and transmitters were tested over a period of two weeks. The objectives in doing so were (a) to assess the effect of environmental conditions on the system performance, (b) to determine the performance limits and the data rate supported by the existing detection methods, and (c) to investigate the possibility to push these limits by employing methods for inter-carrier interference (ICI) compensation.United States. Office of Naval Research (ONR MURI Grant No. N00014-07-1-0738)United States. Office of Naval Research (ONR Grant N00014-07-1-0202

    Examen Final

    No full text
    Resolve

    Examen Final

    No full text
    Resolve

    Hierarchical Video Representation with Trajectory Binary Partition Tree

    No full text
    As early stage of video processing, we introduce an iter- ative trajectory merging algorithm that produces a region- based and hierarchical representation of the video se- quence, called the Trajectory Binary Partition Tree (BPT). From this representation, many analysis and graph cut tech- niques can be used to extract partitions or objects that are useful in the context of specific applications. In order to define trajectories and to create a precise merging algorithm, color and motion cues have to be used. Both types of informations are very useful to characterize objects but present strong differences of behavior in the spa- tial and the temporal dimensions. On the one hand, scenes and objects are rich in their spatial color distributions, but these distributions are rather stable over time. Object mo- tion, on the other hand, presents simple structures and low spatial variability but may change from frame to frame. The proposed algorithm takes into account this key difference and relies on different models and associated metrics to deal with color and motion information. We show that the proposed algorithm outperforms existing hierarchical video segmentation algorithms and provides more stable and pre- cise regionsPeer Reviewe
    corecore